Fault Tolerance Assistant (FTA): An Exception Handling
نویسندگان
چکیده
We propose FTA, a programming model that provides failure localization and transparent recovery of process failures in MPI applications.
منابع مشابه
Fault Tolerance Assistant (FTA): An Exception Handling Programming Model for MPI Applications
Future high-performance computing systems may face frequent failures with their rapid increase in scale and complexity. Resilience to faults has become a major challenge for large-scale applications running on supercomputers, which demands fault tolerance support for prevalent MPI applications. Among failure scenarios, process failures are one of the most severe issues as they usually lead to t...
متن کاملImplementing Coordinated Exception Handling for Distributed Object-Oriented Systems with AspectJ
Exception handling is a very popular technique for incorporating fault tolerance into software systems. However, its use for structuring concurrent, distributed systems is hindered by the fact that the exception handling models of many mainstream object-oriented programming languages are sequential. In this paper we present an aspect-based framework for incorporating concurrent exception handli...
متن کاملVerification of Coordinated Exception Handling
An important challenge faced by the developers of faulttolerant distributed systems is to build fault tolerance mechanisms that are reliable. To achieve the desired levels of reliability, the development of mechanisms for detecting and handling errors should be rigorous or formal. In this paper, we present an approach to modeling and verifying faulttolerant distributed systems that use exceptio...
متن کاملImplementing Coordinated Error Recovery for Distributed Object-Oriented Systems with AspectJ
Exception handling is a very popular technique for incorporating fault tolerance into software systems. However, its use for structuring concurrent, distributed systems is hindered by the fact that the exception handling models of many mainstream object-oriented programming languages are sequential. In this paper we present an aspect-based framework for incorporating concurrent exception handli...
متن کاملTowards a Multi Agents System Coupling Replication and Exception Handling
Multi agents systems are formed of different independent entities placed in several machines. When an entity or an agent fails, it is the whole system that may be in a failure case. Through this paper, we will propose an approach that may guarantee fault tolerance in multi agents systems using two different techniques which are replication and exception handling. Replication uses redundancy to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015